Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-39582: Add caching for some butler primitives during deserialization #858

Merged
merged 11 commits into from
Jul 4, 2023

Conversation

natelust
Copy link
Contributor

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Often may butler primitives are deserialized at the same time, and
it is useful for these objects to share references to each other.
This reduces load time and memory usage.
Downstream code now depends on refs holding UUIDs. Have the yaml
loader convert old style integer ids to UUIDs early rather than
waiting for downstream cleanups.
@natelust natelust requested a review from andy-slac June 27, 2023 21:26
@timj timj changed the title tickets/DM-39582 DM-39582: Add caching for some butler primitives during deserialization Jun 27, 2023
@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Patch coverage: 63.79% and project coverage change: -0.13 ⚠️

Comparison is base (6618b21) 88.01% compared to head (ef41fb5) 87.89%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #858      +/-   ##
==========================================
- Coverage   88.01%   87.89%   -0.13%     
==========================================
  Files         269      270       +1     
  Lines       35420    35544     +124     
  Branches     7424     7452      +28     
==========================================
+ Hits        31176    31241      +65     
- Misses       3103     3145      +42     
- Partials     1141     1158      +17     
Impacted Files Coverage Δ
...hon/lsst/daf/butler/core/dimensions/_coordinate.py 87.46% <38.46%> (-1.85%) ⬇️
python/lsst/daf/butler/core/datastoreRecordData.py 89.41% <45.45%> (-6.59%) ⬇️
python/lsst/daf/butler/core/datasets/type.py 84.58% <46.66%> (-2.59%) ⬇️
python/lsst/daf/butler/core/dimensions/_records.py 85.80% <52.94%> (-4.00%) ⬇️
python/lsst/daf/butler/core/persistenceContext.py 59.25% <59.25%> (ø)
python/lsst/daf/butler/core/datasets/ref.py 83.73% <72.72%> (-1.91%) ⬇️
python/lsst/daf/butler/core/quantum.py 87.61% <78.94%> (+0.29%) ⬆️
python/lsst/daf/butler/transfers/_yaml.py 87.50% <90.90%> (+0.47%) ⬆️
python/lsst/daf/butler/_butler.py 78.69% <100.00%> (ø)
python/lsst/daf/butler/core/__init__.py 100.00% <100.00%> (ø)
... and 2 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, one concern is about cache key for DatasetRef maybe needing component. I did not comment on docstring style, I think ruff should not take care of that.

@@ -0,0 +1 @@
Deprecate reconstituteDimensions argument from Quantum.from_simple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think file name should be removal, not api, according to README.

Add backticks around reconstituteDimensions and Quantum.from_simple and period at the end.

python/lsst/daf/butler/core/datasets/ref.py Show resolved Hide resolved
# Minimalist component will just specify component and id and
# require registry to reconstruct
if set(simple.dict(exclude_unset=True, exclude_defaults=True)).issubset({"id", "component"}):
if not (simple.datasetType is not None or simple.dataId is not None or simple.run is not None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite this as simple.datasetType is None and simple.dataId is None and simple.run is None, I think it makes it easier to read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not logically the same thing, We only want to run this when they are all False. But and is greedy, so False will always gobble up anything.

Comment on lines 74 to 77
key = frozenset(dataset_ids)
cache = PersistenceContextVars.serializedDatastoreRecordMapping.get()
if cache is not None and (value := cache.get(key)) is not None:
return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect many (or maybe any) cache hits for this. DatastoreRecordData is per-quantum structure, I do not think any two quanta can have the same set of input datasets?

@@ -64,6 +64,8 @@
this version of the code.
"""

_refIntId2UUID = defaultdict[int, uuid.UUID](uuid.uuid4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a workaround for input YAML files that still have integer dataset IDs in them? Should we instead fix those YAML files? It may also be better to generate reproducible UUIDs in that case if you want to keep this map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not be sure all the places where this might be happening, so I opted to fix it here. Good point on the reproducibility I will go with that.

natelust and others added 4 commits June 28, 2023 16:28
@natelust natelust force-pushed the tickets/DM-39582 branch 2 times, most recently from 1178cdb to 11e458f Compare June 30, 2023 22:26
@natelust natelust merged commit d926a5c into main Jul 4, 2023
@natelust natelust deleted the tickets/DM-39582 branch July 4, 2023 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants